Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for log_model='best_and_last' option in wandb logger #9356

Conversation

ohayonguy
Copy link

@ohayonguy ohayonguy commented Sep 7, 2021

What does this PR do?

As a followup of the discussion #9342 (reply in thread)

Currently, the Weights and Biases logger is able to log checkpoints as artifacts either at the end of training or during training (any time a new checkpoint is created by the ModelCheckpoint callback). This feature can be controlled with the log_model flag, where log_model=True means to log at the end of training, and log_model=all means to log all checkpoints during training. However, the latter option can blow up the artifacts storage as many checkpoints can be saved, which is undesirable especially when using wandb local

Thus, I added a log_model=best_and_last option, which keeps only the best and the last checkpoints in the wandb artifacts storage. Previous artifact versions are automatically deleted.

No extra dependencies are required. Only wandb.

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun? yup :)

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Sep 7, 2021

Codecov Report

Merging #9356 (e0b0fa7) into master (392c577) will decrease coverage by 0%.
The diff coverage is 67%.

@@          Coverage Diff           @@
##           master   #9356   +/-   ##
======================================
- Coverage      88%     88%   -0%     
======================================
  Files         178     178           
  Lines       14895   14900    +5     
======================================
+ Hits        13138   13141    +3     
- Misses       1757    1759    +2     

logger.experiment.project_name.return_value = "project"
trainer = Trainer(default_root_dir=tmpdir, logger=logger, max_epochs=2, limit_train_batches=3, limit_val_batches=3)
trainer.fit(model)
assert wandb.init().log_artifact.call_count == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not test, that the other ones are properly removed. Can we also test/mock this somehow?

Copy link
Author

@ohayonguy ohayonguy Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure. I will figure out how this should be properly tested.
Do you think that this should be done with mocking? @justusschock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should train for 5 epochs in this test

@justusschock
Copy link
Member

@borisdayma mind reviewing this PR?

@tchaton tchaton added the wandb label Sep 7, 2021
Comment on lines +314 to +319
api = wandb.Api(overrides={"project": self.experiment.project})

for version in api.artifact_versions(f"model-{self.experiment.id}", "model"):
# Clean up all versions that don't have an alias such as 'latest'.
if len(version.aliases) == 0:
version.delete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noob question. I am not familiar with wandb API.

It seems model are being versioned and your are deleting all versions which doesn't have either latest or best aliases.

I am not sure to grasp why this would save only best and last model weights.

Furthermore, I don't think this would work for multiple ModelCheckpoint. Should we save the monitor as metadata to perform the filtering.

best_and_last should produce at maximum num_model_checkpoints + 1 checkpoints right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We automatically tag some artifact versions (model checkpoints). We tag the "latest" and we tag the "best" when monitoring value is defined (they can point to the same model). So there are 2 versions tagged at most.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@borisdayma Yah exactly. The implementation relies on the fact that there are at most 2 live aliases at the same time.
Several aliases can probably be included (best_0, best_1, best_2, ..., latest), but this is another PR :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe best_{monitor_0}, best_{monitor_1}, best_{monitor_2} would be better, it would enable users to navigate their weights better on the Wandb UI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that we directly leverage ModelCheckpoint to identify best metrics (easier to maintain the callback, avoid replicating the same logic, and maybe easier for users).
You can see an example here.

Copy link
Contributor

@borisdayma borisdayma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'm just curious of the naming and if we could offer this functionality in a cleaner way.

An alternative could be to just add an argument such as "remove_unused_artifacts" to the logger which would be false by default.
Then we can just have a separate function for this purpose. What do you think?

@ohayonguy
Copy link
Author

Looks good to me. I'm just curious of the naming and if we could offer this functionality in a cleaner way.

An alternative could be to just add an argument such as "remove_unused_artifacts" to the logger which would be false by default.
Then we can just have a separate function for this purpose. What do you think?

Great idea @borisdayma to make the naming cleaner. However, I think that we need to take this a step further. So we would pass log_model=all and remove_unused_artifacts=True to achieve our functionality? So then, what does "all" means, and what does it refer to? It's a bit confusing to me, and I think we need to come up with a solution that is easier to interpret in terms of what's happening in the background.

Let's examine the current functionality of log_model:

  • Setting this flag to True means to save all the checkpoints ONLY WHEN TRAINING FINISHES. Why is this behavior necessary at all? In what situation would this be preferred over saving checkpoints along the way and deleting old versions, like ModelCheckpoint does? The final result is anyways the same, and the disk space usage is also the same (in the end).
  • Setting this flag to all means to save the entire history of checkpoints.

Therefore, I suggest:

  1. Change the naming of log_model to save_checkpoints_as_artifacts
  2. save_checkpoints_as_artifacts=True would mean to save checkpoints exactly like ModelCheckpoint. This means that we should add aliases like best_0, ... best_k in case save_top_k>1 is passed... (what else should we take into consideration here?)
  3. Add a flag keep_artifacts_history. If this flag is set, then checkpoints are not deleted from the artifacts history.

What do you think?
Thanks!

@borisdayma
Copy link
Contributor

So my concern was related to people with limited network bandwidth, whether upload speed or max quota before being throttled.

I have no issue with changing arguments name but then it requires backward compatibility so better to avoid it unless it's clearly beneficial.

logger = WandbLogger(log_model="best_and_last")
logger.experiment.id = "1"
logger.experiment.project_name.return_value = "project"
trainer = Trainer(default_root_dir=tmpdir, logger=logger, max_epochs=2, limit_train_batches=3, limit_val_batches=3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test use a ModelCheckpoint ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. We should definitely build a better test. I am just not familiar with the whole "Mocking" thing, and with tests in general. Should my test function be also decorated with mocking?

If not, then this requires some wandb default setup to run the experiment. Not sure how to make this right as I never wrote tests for such a large project like Lightning :)

@borisdayma
Copy link
Contributor

Ok so I really like the purpose of this PR.

I think there are a few things to decide:

  • should we be concerned about bandwidth, max upload quota for users? I personally don't have that issue but the reason we have a mode log_model=True that uploads only at the end is for that exact purpose
  • depending on the answer to previous function, we can rethink allowed values for log_model (right now True and all)
  • should the "artifact removal" be on by default? I don't have a strong opinion about it.
  • how to integrate it best? I like keep_artifacts_history (suggested by @ohayonguy ) as an arg in the callback and a separate function of the same name that will be called when the flag is on.
  • I suggest to keep the functionality best_1, best_2… when wanting to keep top k models in a separate PR as it is more complex and opens many more questions

@ohayonguy
Copy link
Author

Ok so I really like the purpose of this PR.

I think there are a few things to decide:

  • should we be concerned about bandwidth, max upload quota for users? I personally don't have that issue but the reason we have a mode log_model=True that uploads only at the end is for that exact purpose
  • depending on the answer to previous function, we can rethink allowed values for log_model (right now True and all)
  • should the "artifact removal" be on by default? I don't have a strong opinion about it.
  • how to integrate it best? I like keep_artifacts_history (suggested by @ohayonguy ) as an arg in the callback and a separate function of the same name that will be called when the flag is on.
  • I suggest to keep the functionality best_1, best_2… when wanting to keep top k models in a separate PR as it is more complex and opens many more questions

Regarding the network bandwidth, I agree that this might be a problem to some people.
However, logging the checkpoints only at the end of training is not that safe. In addition, if someone really has a network bandwidth issue, they can reference artifacts to their local file system instead of uploading them to wandb (the checkpoints are saved by ModelCheckpoint anyways.
Despite all that, I think that we should still keep the functionality to upload only at the end of training (why not?).

How about this:

  • log_model could be either True or False. If it is set to True, then you can play with the way artifacts are uploaded with a different flag:
  • artifacts_history could be all or most_recent and it is self explanatory. This flag is relevant not only for log_model, but for all possible artifacts. This logger might be expanded in the future to collect more kinds of artifacts, and this flag will determine whether history is preserved or not, for all artifacts.

What do you think? @borisdayma @tchaton As in all engineering, there will be a tradeoff eventually...

@tchaton
Copy link
Contributor

tchaton commented Sep 10, 2021

Ok so I really like the purpose of this PR.

I think there are a few things to decide:

  • should we be concerned about bandwidth, max upload quota for users? I personally don't have that issue but the reason we have a mode log_model=True that uploads only at the end is for that exact purpose
  • depending on the answer to previous function, we can rethink allowed values for log_model (right now True and all)
  • should the "artifact removal" be on by default? I don't have a strong opinion about it.
  • how to integrate it best? I like keep_artifacts_history (suggested by @ohayonguy ) as an arg in the callback and a separate function of the same name that will be called when the flag is on.
  • I suggest to keep the functionality best_1, best_2… when wanting to keep top k models in a separate PR as it is more complex and opens many more questions

Hey @borisdayma, quick question about Wandb internals. Are the model uploaded within a separated thread / process ? How would be the impact of training to log all models ?

Best,
T.C

@borisdayma
Copy link
Contributor

Hey @borisdayma, quick question about Wandb internals. Are the model uploaded within a separated thread / process ? How would be the impact of training to log all models ?

It is performed in a separate process so should not impact training at all (even if the wandb process were to fail).

How about this:

  • log_model could be either True or False. If it is set to True, then you can play with the way artifacts are uploaded with a different flag:
  • artifacts_history could be all or most_recent and it is self explanatory. This flag is relevant not only for log_model, but for all possible artifacts. This logger might be expanded in the future to collect more kinds of artifacts, and this flag will determine whether history is preserved or not, for all artifacts.

That's a great idea! In addition I suggest:

  • log_model=True logs during training by default (more convenient), so it's the same as the current log_model='all'
  • log_model='all' still works but will be deprecated in the future (warning for now)
  • logging only at the end needs a new flag or a new argument

@tchaton
Copy link
Contributor

tchaton commented Sep 14, 2021

Hey @borisdayma, @ohayonguy,

Yes, I like the proposal too. Should we adapt this PR directly with the suggested changes ?

Best,
T.C

@borisdayma
Copy link
Contributor

As you guys want! I can work on it later this week unless @ohayonguy wants to update this PR?

@ohayonguy
Copy link
Author

As you guys want! I can work on it later this week unless @ohayonguy wants to update this PR?

Hey guys @borisdayma @tchaton
I can definitely work on it (and would be happy to). However it could take a while as I am currently super busy with my internship.
Sorry for the late response.

@Borda Borda added the 3rd party Related to a 3rd-party label Sep 23, 2021
@Borda Borda removed the wandb label Sep 23, 2021
@awaelchli awaelchli added feature Is an improvement or enhancement logger Related to the Loggers labels Sep 27, 2021
@stale
Copy link

stale bot commented Oct 12, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions.

@stale stale bot added the won't fix This will not be worked on label Oct 12, 2021
@stale
Copy link

stale bot commented Oct 17, 2021

This pull request is going to be closed. Please feel free to reopen it create a new from the actual master.

@stale stale bot closed this Oct 17, 2021
@sh3rlock14
Copy link

Hi everyone 👋🏻!
I see that this PR is now closed and the proposed feature couldn't make it into newer versions of PL: did you chose to keep the behaviour of WandbLogger as it is, or is there something new that is been developing?

Thanks for your help!

@Buckler89
Copy link

Any updates on this? Will this feature be merged?
I think this is important since logging the model during training helps protect against crashes and allows the experiment to be resumed from its last saved checkpoint.

@taczin
Copy link

taczin commented Feb 21, 2025

+1 for this feature. Anyone wanting to train a large model will be interested in this. Due to the large model size keeping a full history of checkpoints will take up too much disk space but waiting for end of training to upload top k model checkpoints is too vulnerable to crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Related to a 3rd-party feature Is an improvement or enhancement has conflicts logger Related to the Loggers won't fix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants